\documentclass[a4paper,12pt]{article}

\usepackage{setspace, amssymb, amsmath, xspace}
\usepackage[left=1in,right=1in,top=2cm, bottom=4cm]{geometry}
\usepackage{amsfonts}
\usepackage{subfigure}
\usepackage{graphicx,float}
\usepackage{xcolor}
\usepackage{natbib}
\usepackage{pdfsync}

\doublespacing

\DeclareMathOperator{\E}{E}
\bibliographystyle{apalike}
\newcommand{\kfun}{$K$-function\xspace}
\newcommand{\khat}[1]{\hat{K}_{#1}}

\title{Assignment on Spatial Point Processes}
\author{P. A. Henrys and P. E. Brown}
\date{Due 31 October, 5pm}

\begin{document}

\maketitle

\section{Introduction}

In many epidemiological studies the main objective is to trace a link between environmental factors and the prevalence of a particular disease. A spatial point process analysis can help us answer questions such as this by looking at the spatial distribution of cases of the disease in question. We wish to assess whether the observed cases are clustered, having taken into account known factors affecting the distribution, which are of little scientific interest. For example, changes in population dynamics.  Your main objective in the following assignment is therefore to determine whether there is any residual clustering in the data provided to you, which we will done using the $K$-function. 
\newline
\newline
Before you start: 
\begin{itemize}
\item Install and load the spatstat, splancs and spatialkernel libraries in {\tt R} by using the 'packages' menu in R and typing {\tt library(spatstat) } on the command line. These packages contain the functions needed in order to estimate the inhomogeneous $K$-function. 
\item download the 'sppassignment.RData' file from \verb!http://code.google.com/p/spatialcourse/source/browse/#svn/slides/data!, load it into R with {\tt load('sppassignment.RData')}.  This file contains the data for this assignment. 
\end{itemize}

The object {\tt EcsData} contains the locations of {\tt X} cases of cancer and {\tt Y} controls that were recorded in Ontario and have been provided by Cancer Care Ontario. The file has columns {\tt x} and {\tt y}, which give the location of an event on a scale of km from an unknown origin in order to maintain a degree of confidentiality, the column {\tt Y} taking a value $0$ if the row corresponds to a control and $1$ if it corresponds to a case and finally the file contains a column {\tt int} which contains the estimated intensity of the process evaluated at that location. Further to this data on locations of cases and controls, we also have information on the population density in each census subdivision, which we have used to produce a map of the intensity contained in the file {\tt EstInt} in the form of a matrix and also we have a polygon of the boundary of the region of Ontario under question in the file {\tt SthOnt}. 

\section*{Task 1}
\begin{itemize}
\item[a)] Plot the locations of the controls together with the polygon of the boundary
\item[b)] Add the locations of the cases to this plot giving them a separate colour and point type to the controls
\item[c)] Comment on the plots produced. Do the spatial point patterns appear homogeneous to you? Why do you think this is / is not the case?
\end{itemize}

\section*{Task 2}
The file {\tt EstInt}, which contains the estimated intensity of the spatial point process, is given in the form of an image object. 
\begin{itemize}
\item[a)] Use the {\tt image} command in {\tt R} to produce a plot of the intensity surface. Use {\tt ?image} for further assistance. 
\item[b)] Add a line plot of the boundary of the region ({\tt SthOnt}). The intensity goes slightly beyond the boundary of the region, but this can be ignored.
\item[c)] Describe some interesting features of the plot produced together with an explanation. 
\end{itemize} 

\section*{Task 3}
Estimating the inhomogeneous $K$-function. 
\begin{itemize}
\item[a)] First you must create a vector of distances to calculate the $K$-function at. Use your plots from section 2 to help you decide what this vector should go up to and be sure to make it a suitable length. \textit{Hint: Use the} {\tt seq} \textit{command to help you do this} 
\item[b)] Type {\tt ?kinhat} to view the help file for the {\tt kinhat} function used to estimate the inhomogeneous $K$-function and use this to estimate the $K$-function for just the cases. \textit{Hint: the point locations and polygon coordinates need to be included using the {\tt as.points} command. eg. {\tt as.points(poly)}}
\end{itemize}

Now that we have an estimate of the $K$-function for the cases we can compare this to what we would expect under independence between locations, ie. under no clustering.
\begin{itemize}
\item[c)] Produce a plot of the estimated $K$-function against the vector of distances you created that the $K$-fucntion was evaluated at.
\item[d)] What is the theoretical inhomogeneous $K$-function under independence between the event locations?
\item[e)] Add a line of this theoretical $K$-function to your plot using a different colour.
\item[f)] Comment on your plot. 
\end{itemize}

\section*{Task 4}
\begin{itemize}
\item[a)] Perform a parametric bootstrap on the data to estimate an envelope for the $K$-function under the null hypothesis.
\item[b)] Use the {\tt envelope} function to do this. Type {\tt ?envelope} to view the help file. 
\item[c)] To use this function type the following 
\begin{verbatim}
envel = envelope(intmodel, nsim=99, verbose=F, fun=Kinhom, lambda=estint)
\end{verbatim}
\emph{Note: The objects you need to do this are all already in the workspace.}
\item[d)] Comment on how you could write you own algorithm to perform a parametric bootstrap in a similar way.
\end{itemize}
\section*{Task 5}
\begin{itemize}
\item[a)] Repeat all the $K$-function analysis above for the controls. 
\item[b)] Comment on your findings and compare the results from the cases to the controls. Do you notice anything interesting? 
\item[c)] Why might we wish to include controls in the analysis?
\end{itemize}

\section*{Task 6}
\begin{itemize}
\item[a)] Plot the estimated $K$-function for both the cases and controls on the same plot and include the theoretical $K$-function and the confidence envelopes from both the cases and controls. Use variations of line types and colours to make the plot look accessible.   
\end{itemize}

\section*{Task 7}
\begin{itemize}
\item[a)] Use the function {\tt khat} to produce an estimate of the homogeneous $K$-function for both the cases and controls. 
\item[b)] Add this to the plot produced in the previous section and comment on your findings. 
\item[c)] Why does the homogeneous $K$-function produce results like this? 
\end{itemize}

\section*{Hints}
\begin{verbatim}
library(splancs)
library(spatstat)
library(sp)
library(spatialkernel)

thecases = EcsData[EcsData$Y==1,]


caseppp = ppp( thecases$x, thecases$y, window=as.owin(EstInt), marks=thecases$int)


plot(SthOnt, type='l')
points(thecases[,c('x', 'y')], col='red')

 image(EstInt)
 lines(SthOnt)

 casek = Kinhom(caseppp)

casek = kinhat(as.points(thecases[,c('x','y')]), temp, 
	as.points(SthOnt), seq(0, 100, len=5))
plot(casek$s, casek$k)
lines(casek$s, pi*casek$s^2)

\end{verbatim}


\end{document}